Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Handle footnote names that have been parsed into multiple nodes #311

Merged
merged 5 commits into from
Jun 7, 2023

Conversation

digitalmoksha
Copy link
Collaborator

For example [^_foo] gives ^, _, and foo.

Related cmark-gfm PR: github/cmark-gfm#229

Related issue: #306

@digitalmoksha digitalmoksha marked this pull request as draft June 1, 2023 00:43
@digitalmoksha digitalmoksha force-pushed the bw-fix-footnote-special-chars branch from 05a4943 to 174ef95 Compare June 1, 2023 03:52
@digitalmoksha digitalmoksha marked this pull request as ready for review June 1, 2023 03:52
@digitalmoksha digitalmoksha force-pushed the bw-fix-footnote-special-chars branch from 174ef95 to 56aa94c Compare June 1, 2023 04:14
@digitalmoksha
Copy link
Collaborator Author

digitalmoksha commented Jun 1, 2023

@kivikakk I think this one is ready for review.

I think this PR also addresses #271, so added a test for that.

You mentioned in #308 (comment) about using an enum for casing - the second commit is my attempt at that. Feel free to tweak as you desire.

EDIT:

I think this PR also addresses #271, so added a test for that.

I didn't adjust the lexer at all, it mostly seems to be happening naturally. 🤷

Copy link
Owner

@kivikakk kivikakk left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks, this is excellent. 🤍 I'll run the fuzzer specifically on the footnote parsing for a few hours before merging.

src/strings.rs Outdated
Comment on lines 262 to 265
if Case::Preserve == casing {
v.push(c);
} else {
v.push_str(&c.to_lowercase().to_string());
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You could also consider something like this in future, I think it's probably considered more "idiomatic Rust":

Suggested change
if Case::Preserve == casing {
v.push(c);
} else {
v.push_str(&c.to_lowercase().to_string());
match casing {
Case::Preserve => v.push(c),
Case::DontPreserve => v.push_str(&c.to_lowercase().to_string()),

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah thanks. I went ahead and applied this to the original commit and force pushed.

@kivikakk
Copy link
Owner

kivikakk commented Jun 6, 2023

This ran into an issue with fuzzing almost immediately:

thread '<unnamed>' panicked at 'index out of bounds: the len is 0 but the index is 0', /home/kivikakk/Code/comrak/src/parser/inlines.rs:1226:25
note: run with `RUST_BACKTRACE=1` environment variable to display a backtrace
==898534== ERROR: libFuzzer: deadly signal
    #0 0x5626beaf9681  (/home/kivikakk/Code/comrak/fuzz/target/x86_64-unknown-linux-gnu/release/gfm_footnotes+0xd12681) (BuildId: e3da9d09f38f242115f79afb534ea3e10bd45d46)
    #1 0x5626bfd31000  (/home/kivikakk/Code/comrak/fuzz/target/x86_64-unknown-linux-gnu/release/gfm_footnotes+0x1f4a000) (BuildId: e3da9d09f38f242115f79afb534ea3e10bd45d46)
    #2 0x5626bfcff965  (/home/kivikakk/Code/comrak/fuzz/target/x86_64-unknown-linux-gnu/release/gfm_footnotes+0x1f18965) (BuildId: e3da9d09f38f242115f79afb534ea3e10bd45d46)

[...]

SUMMARY: libFuzzer: deadly signal
MS: 1 PersAutoDict- DE: "\012\000\000\000\000\000\000\000"-; base unit: 6ef5c9b29fff72fbfb9580af4477e29e40865d54
0x2f,0x5b,0x9,0xa,0x0,0x0,0x0,0x0,0x0,0x0,0x0,0x9,0x2f,0x40,0xe,0x5d,0x7e,
/[\011\012\000\000\000\000\000\000\000\011/@\016]~
artifact_prefix='/home/kivikakk/Code/comrak/fuzz/artifacts/gfm_footnotes/'; Test unit written to /home/kivikakk/Code/comrak/fuzz/artifacts/gfm_footnotes/crash-34e7e49cbeb94d0f71aa5292100c1f8f6ff39053
Base64: L1sJCgAAAAAAAAAJL0AOXX4=
stat::number_of_executed_units: 1189
stat::average_exec_per_sec:     0
stat::new_units_added:          258
stat::slowest_unit_time_sec:    0
stat::peak_rss_mb:              98
INFO: exiting: 77 time: 8s

────────────────────────────────────────────────────────────────────────────────

Failing input:

        fuzz/artifacts/gfm_footnotes/crash-34e7e49cbeb94d0f71aa5292100c1f8f6ff39053

Output of `std::fmt::Debug`:

        "/[\t\n\0\0\0\0\0\0\0\t/@\u{e}]~"

Reproduce with:

        cargo fuzz run gfm_footnotes fuzz/artifacts/gfm_footnotes/crash-34e7e49cbeb94d0f71aa5292100c1f8f6ff39053

Minimize test case with:

        cargo fuzz tmin gfm_footnotes fuzz/artifacts/gfm_footnotes/crash-34e7e49cbeb94d0f71aa5292100c1f8f6ff39053

────────────────────────────────────────────────────────────────────────────────

Error: Fuzz target exited with exit status: 77

I've pushed a commit that includes the fuzzer for you to run, too.

In this case, it looks like we presume the presence of a text node also implies it's non-empty, but isn't always. I've made the fix for this and hit another one:

thread '<unnamed>' panicked at 'called `Option::unwrap()` on a `None` value', /home/kivikakk/Code/comrak/src/parser/inlines.rs:899:49
note: run with `RUST_BACKTRACE=1` environment variable to display a backtrace
==899078== ERROR: libFuzzer: deadly signal
    #0 0x5653b0c12681  (/home/kivikakk/Code/comrak/fuzz/target/x86_64-unknown-linux-gnu/release/gfm_footnotes+0xd12681) (BuildId: 2c08ec3b468305f79faad03c267918d049ba08be)
    #1 0x5653b1e4a020  (/home/kivikakk/Code/comrak/fuzz/target/x86_64-unknown-linux-gnu/release/gfm_footnotes+0x1f4a020) (BuildId: 2c08ec3b468305f79faad03c267918d049ba08be)

[...]

SUMMARY: libFuzzer: deadly signal
MS: 2 ChangeBit-ChangeByte-; base unit: 40577977c0f5ddca5ec4bcb62ba5f20488bb5cc3
0x5b,0x5e,0x3c,0x7e,0x3c,0x61,0x20,0x74,0x28,0x61,0x20,0x74,0x7a,0x7e,0x5d,0x0,
[^<~<a t(a tz~]\000
artifact_prefix='/home/kivikakk/Code/comrak/fuzz/artifacts/gfm_footnotes/'; Test unit written to /home/kivikakk/Code/comrak/fuzz/artifacts/gfm_footnotes/crash-c4e4bded98ca97794351be0834a7c5d57ff3426c
Base64: W148fjxhIHQoYSB0en5dAA==
stat::number_of_executed_units: 25388
stat::average_exec_per_sec:     8462
stat::new_units_added:          1216
stat::slowest_unit_time_sec:    0
stat::peak_rss_mb:              408
INFO: exiting: 77 time: 4s

────────────────────────────────────────────────────────────────────────────────

Failing input:

        fuzz/artifacts/gfm_footnotes/crash-c4e4bded98ca97794351be0834a7c5d57ff3426c

Output of `std::fmt::Debug`:

        "[^<~<a t(a tz~]\0"

Reproduce with:

        cargo fuzz run gfm_footnotes fuzz/artifacts/gfm_footnotes/crash-c4e4bded98ca97794351be0834a7c5d57ff3426c

Minimize test case with:

        cargo fuzz tmin gfm_footnotes fuzz/artifacts/gfm_footnotes/crash-c4e4bded98ca97794351be0834a7c5d57ff3426c

────────────────────────────────────────────────────────────────────────────────

Error: Fuzz target exited with exit status: 77

This is getting thorny quickly, so I'll leave more to you. The repro case is as follows:

00000000: 5b5e 3c7e 3c61 2074 2861 2074 7a7e 5d00  [^<~<a t(a tz~].

You can pipe this text into xxd -r to get the test case.

To run the fuzzer yourself, the following command runs it (on six cores):

cargo +nightly fuzz run gfm_footnotes -j 6

You'll need to install a nightly Rust toolchain and cargo +nightly install cargo-fuzz.

@digitalmoksha digitalmoksha force-pushed the bw-fix-footnote-special-chars branch from 0d6a5c9 to 014d341 Compare June 6, 2023 21:21
@digitalmoksha digitalmoksha force-pushed the bw-fix-footnote-special-chars branch from 014d341 to e80ed0e Compare June 6, 2023 21:23
@digitalmoksha
Copy link
Collaborator Author

So the delimiter stack was being traversed, which was incorrect since the footnote name was collapsed into one text node. The latest commit fixes this. Ran fuzzing for 2 hours with no problem.

@digitalmoksha digitalmoksha marked this pull request as draft June 6, 2023 23:01
@digitalmoksha digitalmoksha marked this pull request as ready for review June 7, 2023 00:44
@kivikakk
Copy link
Owner

kivikakk commented Jun 7, 2023

Thanks so much, this looks great!

@kivikakk kivikakk merged commit 32d36dc into kivikakk:main Jun 7, 2023
@digitalmoksha digitalmoksha deleted the bw-fix-footnote-special-chars branch June 7, 2023 13:14
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants